紧张的机器人由刚性杆和柔性电缆组成,表现出高强度对重的比率和极端变形,使它们能够驾驭非结构化的地形,甚至可以在严酷的冲击力上生存。但是,由于其高维,复杂的动态和耦合体系结构,它们很难控制。基于物理学的仿真是制定运动策略的途径,然后可以将其转移到真实的机器人中,但是建模时态机器人是一项复杂的任务,因此模拟会经历大量的SIM2REAL间隙。为了解决这个问题,本文介绍了台词机器人的真实2SIM2REAL策略。该策略是基于差异物理引擎的,可以在真正的机器人(即离线测量和一个随机轨迹)中进行有限的数据进行训练,并达到足够高的精度以发现可转移的运动策略。除了整体管道之外,这项工作的主要贡献包括在接触点处计算非零梯度,损失函数和轨迹分割技术,该技术避免了训练期间梯度评估的冲突。在实际的3杆张力机器人上证明并评估了所提出的管道。
translated by 谷歌翻译
点云的几乎没有分割仍然是一项具有挑战性的任务,因为没有有效的方法将局部点云信息转换为全局表示,这阻碍了点特征的概括能力。在这项研究中,我们提出了双向特征全球化(BFG)方法,该方法利用点特征和原型向量之间的相似性测量,以双向方式将全球感知嵌入到局部点特征中。随着点对点型全球化(PO2PRG),BFG根据从密度点特征到稀疏原型的相似权重将本地点特征汇总到原型。使用原型到点全球化(PR2POG),基于从稀疏原型到密集点特征的相似性权重,全局感知嵌入到局部点特征中。每个类嵌入全局感知的类的稀疏原型汇总到基于度量学习框架的几个原型3D分割的单个原型。对S3DIS和SCANNET的广泛实验表明,BFG显着超过了最新方法。
translated by 谷歌翻译
机载激光扫描(ALS)点云的分类是遥感和摄影测量场的关键任务。尽管最近基于深度学习的方法取得了令人满意的表现,但他们忽略了接受场的统一性,这使得ALS点云分类对于区分具有复杂结构和极端规模变化的区域仍然具有挑战性。在本文中,为了配置多受感受性的场特征,我们提出了一个新型的接受场融合和分层网络(RFFS-NET)。以新颖的扩张图卷积(DGCONV)及其扩展环形扩张卷积(ADCONV)作为基本的构建块,使用扩张和环形图融合(Dagfusion)模块实现了接受场融合过程,该模块获得了多受感染的场特征代表通过捕获带有各种接收区域的扩张和环形图。随着计算碱基的计算基础,使用嵌套在RFFS-NET中的多级解码器进行的接收场的分层,并由多层接受场聚集损失(MRFALOSS)驱动,以驱动网络驱动网络以学习在具有不同分辨率的监督标签的方向。通过接受场融合和分层,RFFS-NET更适应大型ALS点云中具有复杂结构和极端尺度变化区域的分类。在ISPRS Vaihingen 3D数据集上进行了评估,我们的RFFS-NET显着优于MF1的基线方法5.3%,而MIOU的基线方法的总体准确性为82.1%,MF1的总准确度为71.6%,MIOU的MF1和MIOU为58.2%。此外,LASDU数据集和2019 IEEE-GRSS数据融合竞赛数据集的实验显示,RFFS-NET可以实现新的最新分类性能。
translated by 谷歌翻译
点云的语义分割通过密集预测每个点的类别来产生对场景的全面理解。由于接收场的一致性,点云的语义分割对于多受感受性场特征的表达仍然具有挑战性,这会导致对具有相似空间结构的实例的错误分类。在本文中,我们提出了一个植根于扩张图特征聚集(DGFA)的图形卷积网络DGFA-NET,该图由通过金字塔解码器计算出的多基质聚集损失(Maloss)引导。为了配置多受感受性字段特征,将建议的扩张图卷积(DGCONV)作为其基本构建块,旨在通过捕获带有各种接收区域的扩张图来汇总多尺度特征表示。通过同时考虑用不同分辨率的点集作为计算碱基的点集惩罚接收场信息,我们引入了由Maloss驱动的金字塔解码器,以了解接受田间的多样性。结合这两个方面,DGFA-NET显着提高了具有相似空间结构的实例的分割性能。 S3DIS,ShapenetPart和Toronto-3D的实验表明,DGFA-NET优于基线方法,实现了新的最新细分性能。
translated by 谷歌翻译
Audio-Visual scene understanding is a challenging problem due to the unstructured spatial-temporal relations that exist in the audio signals and spatial layouts of different objects and various texture patterns in the visual images. Recently, many studies have focused on abstracting features from convolutional neural networks while the learning of explicit semantically relevant frames of sound signals and visual images has been overlooked. To this end, we present an end-to-end framework, namely attentional graph convolutional network (AGCN), for structure-aware audio-visual scene representation. First, the spectrogram of sound and input image is processed by a backbone network for feature extraction. Then, to build multi-scale hierarchical information of input features, we utilize an attention fusion mechanism to aggregate features from multiple layers of the backbone network. Notably, to well represent the salient regions and contextual information of audio-visual inputs, the salient acoustic graph (SAG) and contextual acoustic graph (CAG), salient visual graph (SVG), and contextual visual graph (CVG) are constructed for the audio-visual scene representation. Finally, the constructed graphs pass through a graph convolutional network for structure-aware audio-visual scene recognition. Extensive experimental results on the audio, visual and audio-visual scene recognition datasets show that promising results have been achieved by the AGCN methods. Visualizing graphs on the spectrograms and images have been presented to show the effectiveness of proposed CAG/SAG and CVG/SVG that could focus on the salient and semantic relevant regions.
translated by 谷歌翻译
Mitosis nuclei count is one of the important indicators for the pathological diagnosis of breast cancer. The manual annotation needs experienced pathologists, which is very time-consuming and inefficient. With the development of deep learning methods, some models with good performance have emerged, but the generalization ability should be further strengthened. In this paper, we propose a two-stage mitosis segmentation and classification method, named SCMitosis. Firstly, the segmentation performance with a high recall rate is achieved by the proposed depthwise separable convolution residual block and channel-spatial attention gate. Then, a classification network is cascaded to further improve the detection performance of mitosis nuclei. The proposed model is verified on the ICPR 2012 dataset, and the highest F-score value of 0.8687 is obtained compared with the current state-of-the-art algorithms. In addition, the model also achieves good performance on GZMH dataset, which is prepared by our group and will be firstly released with the publication of this paper. The code will be available at: https://github.com/antifen/mitosis-nuclei-segmentation.
translated by 谷歌翻译
Recently, spoken dialogue systems have been widely deployed in a variety of applications, serving a huge number of end-users. A common issue is that the errors resulting from noisy utterances, semantic misunderstandings, or lack of knowledge make it hard for a real system to respond properly, possibly leading to an unsatisfactory user experience. To avoid such a case, we consider a proactive interaction mechanism where the system predicts the user satisfaction with the candidate response before giving it to the user. If the user is not likely to be satisfied according to the prediction, the system will ask the user a suitable question to determine the real intent of the user instead of providing the response directly. With such an interaction with the user, the system can give a better response to the user. Previous models that predict the user satisfaction are not applicable to DuerOS which is a large-scale commercial dialogue system. They are based on hand-crafted features and thus can hardly learn the complex patterns lying behind millions of conversations and temporal dependency in multiple turns of the conversation. Moreover, they are trained and evaluated on the benchmark datasets with adequate labels, which are expensive to obtain in a commercial dialogue system. To face these challenges, we propose a pipeline to predict the user satisfaction to help DuerOS decide whether to ask for clarification in each turn. Specifically, we propose to first generate a large number of weak labels and then train a transformer-based model to predict the user satisfaction with these weak labels. Empirically, we deploy and evaluate our model on DuerOS, and observe a 19% relative improvement on the accuracy of user satisfaction prediction and 2.3% relative improvement on user experience.
translated by 谷歌翻译
由于需要经济的储存和二元法规的效率,因此无监督的哈希对二元表示学习引起了很多关注。它旨在编码锤子空间中的高维特征,并在实例之间保持相似性。但是,大多数现有方法在基于多种的方法中学习哈希功能。这些方法捕获了数据的局部几何结构(即成对关系),并且在处理具有不同语义信息的实际特征(例如颜色和形状)的真实情况时缺乏令人满意的性能。为了应对这一挑战,在这项工作中,我们提出了一种有效的无监督方法,即共同个性化的稀疏哈希(JPSH),以进行二进制表示学习。具体来说,首先,我们提出了一个新颖的个性化哈希模块,即个性化的稀疏哈希(PSH)。构建了不同的个性化子空间,以反映不同群集的特定类别属性,同一群集中的自适应映射实例与同一锤子空间。此外,我们为不同的个性化子空间部署稀疏约束来选择重要功能。我们还收集了其他群集的优势,以避免过度拟合,以构建PSH模块。然后,为了在JPSH中同时保留语义和成对的相似性,我们将基于PSH和歧管的哈希学习纳入无缝配方中。因此,JPSH不仅将这些实例与不同的集群区分开,而且还保留了集群中的本地邻里结构。最后,采用了交替优化算法,用于迭代捕获JPSH模型的分析解决方案。在四个基准数据集上进行的大量实验验证了JPSH是否在相似性搜索任务上优于几个哈希算法。
translated by 谷歌翻译
命名实体识别(NER)是检测和对实体跨越文本的跨度的任务。当实体跨越彼此之间的重叠时,此问题被称为嵌套NER。基于跨度的方法已被广泛用于应对嵌套的NER。这些方法中的大多数都会获得分数$ n \ times n $矩阵,其中$ n $表示句子的长度,每个条目对应于跨度。但是,先前的工作忽略了分数矩阵中的空间关系。在本文中,我们建议使用卷积神经网络(CNN)对分数矩阵中的这些空间关系进行建模。尽管很简单,但在三个常用的嵌套NER数据集中进行的实验表明,我们的模型超过了几种具有相同预训练的编码器的最近提出的方法。进一步的分析表明,使用CNN可以帮助模型更准确地找到嵌套实体。此外,我们发现不同的论文对三个嵌套的NER数据集使用了不同的句子引导,这将影响比较。因此,我们发布了一个预处理脚本,以促进将来的比较。
translated by 谷歌翻译
基于AI的蛋白质结构预测管道(例如AlphaFold2)已达到了几乎实验的准确性。这些高级管道主要依赖于多个序列比对(MSA)和模板作为输入来从同源序列中学习共进化信息。但是,从蛋白质数据库中搜索MSA和模板很耗时,通常需要数十分钟。因此,我们尝试通过仅使用蛋白质的主要序列来探索快速蛋白质结构预测的极限。提出了Helixfold单一的形式将大规模蛋白质语言模型与AlphaFold2的优质几何学习能力相结合。我们提出的方法,Helixfold单个,首先预先培训是一种大规模蛋白质语言模型(PLM),使用了数以千计的主要序列利用自我监督的学习范式,将用作MSA和模板的替代方法共同进化信息。然后,通过将预训练的PLM和AlphaFold2的必需组件组合在一起,我们获得了一个端到端可区分模型,以仅从主要序列预测原子的3D坐标。 Helixfold-Single在数据集CASP14和Cameo中得到了验证,通过基于MSA的方法,具有大型同源家庭的基于MSA的方法,从而实现了竞争精度。此外,与主流管道进行蛋白质结构预测相比,Helixfold单个的时间比主流管道的时间少得多,这表明其在需要许多预测的任务中的潜力。 HelixFold-Single的守则可在https://github.com/paddlepaddle/paddlehelix/tree/dev/dev/pprotein_folding/helixfold-single上获得,我们还在https://paddlehelix.baidu.com上提供稳定的Web服务。 /app/drug/protein-single/prevast。
translated by 谷歌翻译